ops-catalog: materialize stats to Bigtable by williamhbaker · Pull Request #2955 · estuary/flow

williamhbaker · 2026-05-18T16:48:52Z

Description:

Create a minimalist Bigtable rust read client for reading catalog stats. As of these changes it's not wired up to anything, but it was developed in conjunction with prototyping various other usages of catalog stats in the code base: Data movement stalled alerts & abandoned task detection (control plane agent), OpenMetrics API, billing calculations, and speculative UI capabilities.
Run a Bigtable emulator locally for local stats, and materialize stats into it in parallel with the Postgres materialization. Some basic instructions are included for verifying the two databases are equivalent.
Add the new L2 stats derivation (now partitioned by grain) to the update_l2_reporting control plane API. This is needed for the local Bigtable stats materialization to work. Once the control plane API is deployed on this new code, it will also cause it to publish the new L2 derivation in production.

Per the above, prior to deploying the new control plane API on this code, we should create the new reporting-dedicated data plane. Then just after the control plane API is deployed, call the update_l2_reporting endpoint with the default data plane parameter set to that reporting data plane so that the new L2 derivation is created there.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)

Move from its previous location in control-plane-api to the ops crate so the upcoming Bigtable read client can share the definition.

Read side of the `catalog_stats_<grain>` tables that the new stats-view materialization will write. Wraps the BigTable data API behind name/range/prefix fetchers, each returning a coroutine-backed stream.

The new catalog-stats client and its integration tests need a BigTable endpoint; provide one via the cloud-sdk emulator wrapped in a systemd unit and a mise task. The local control-plane agent depends on the emulator so the dual-read path in `PGControlPlane` has somewhere to go, and platform-test runs `local:bigtable` before nextest so the same integration tests run in CI.

`update_l2_reporting` populates the new derivation in parallel with the old one and stamps each L1 source with `not_before` = today so the shadow doesn't backfill history during rollout.

jgraettinger · 2026-05-20T20:39:49Z

+        self.read_rows(grain, row_set, vec![])
+    }
+
+    /// Streams every `catalog_stats_<grain>` row whose `catalog_name` stats


Suggested change

/// Streams every `catalog_stats_<grain>` row whose `catalog_name` stats

/// Streams every `catalog_stats_<grain>` row whose `catalog_name` starts

jgraettinger · 2026-05-20T20:49:38Z

+                            retries = 0;
+                        }
+                        ReadResult::Retry(status) => {
+                            if retries >= MAX_RETRIES {


Consider using the gazette crate strategy of having the caller be responsible for retry policy, with a wrapping result type that indicates number of attempts: https://github.com/estuary/flow/blob/master/crates/gazette/src/journal/read/mod.rs#L55-L58
This also lets the caller decide what to do with logging of the actual error.

jgraettinger · 2026-05-20T20:51:40Z

+        grain: Grain,
+        row_set: bt::RowSet,
+        additional_filters: Vec<bt::RowFilter>,
+    ) -> impl futures_core::Stream<Item = anyhow::Result<CatalogStats>> + '_ {


Crates like gazette, tokens, and the V2 runtime use tonic::Result over anyhow::Result consistently for all streamed results and IPC (the V1 runtime does as well now, IIRC, with a couple of exceptions like derive-sqlite).
I've found over time that tonic::Status is nicer to work with as a fundamental type for protocol errors -- for one, it's Clone, where anyhow::Error isn't.
The rule of thumb I've adhered to is anyhow::Result for errors within a current application call stack, converted to/from tonic::Status when communicating that error from/to other systems and inter-process services.

jgraettinger · 2026-05-20T21:05:07Z

+        row_set: bt::RowSet,
+        additional_filters: Vec<bt::RowFilter>,
+    ) -> Self {
+        // `CellsPerColumnLimitFilter(1)`: materialize-bigtable writes


Oh, fantastic that it has MVCC built in! So we don't even need to represent multiple versions ourselves...

jgraettinger · 2026-05-20T21:15:39Z

        );
    };

+    let models::DeriveUsing::Typescript(models::DeriveUsingTypescript {


Good, I'm glad this is using TypeScript (I was thinking about it the other day and couldn't recall which derivation runtime it used).

TypeScript should be easy(*) to scale using the V2 runtime. We'll be able to support SQLite derivations in V2 and run them as we do today, but not scale them out for the time being (until we meaningfully tackle block devices).

ops: add catalog_stats serde types for the L2 rollup document

7955d23

Move from its previous location in control-plane-api to the ops crate so the upcoming Bigtable read client can share the definition.

williamhbaker force-pushed the wb/stats-next branch from 9dd17d1 to 3c7804d Compare May 18, 2026 17:13

williamhbaker added 3 commits May 18, 2026 17:38

catalog-stats: BigTable read client for catalog-stats rollups

9619aba

Read side of the `catalog_stats_<grain>` tables that the new stats-view materialization will write. Wraps the BigTable data API behind name/range/prefix fetchers, each returning a coroutine-backed stream.

ops-catalog: new ops/rollups/L2/catalog-stats derivation

2373c29

`update_l2_reporting` populates the new derivation in parallel with the old one and stamps each L1 source with `not_before` = today so the shadow doesn't backfill history during rollout.

williamhbaker force-pushed the wb/stats-next branch from 3c7804d to 2373c29 Compare May 18, 2026 17:39

williamhbaker marked this pull request as ready for review May 18, 2026 18:05

williamhbaker requested a review from jgraettinger May 18, 2026 18:05

jgraettinger approved these changes May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ops-catalog: materialize stats to Bigtable#2955

ops-catalog: materialize stats to Bigtable#2955
williamhbaker wants to merge 4 commits into
masterfrom
wb/stats-next

williamhbaker commented May 18, 2026 •

edited

Loading

Uh oh!

jgraettinger May 20, 2026

Uh oh!

jgraettinger May 20, 2026

Uh oh!

jgraettinger May 20, 2026

Uh oh!

jgraettinger May 20, 2026

Uh oh!

jgraettinger May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	/// Streams every `catalog_stats_<grain>` row whose `catalog_name` stats
	/// Streams every `catalog_stats_<grain>` row whose `catalog_name` starts

Conversation

williamhbaker commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgraettinger May 20, 2026

Choose a reason for hiding this comment

Uh oh!

jgraettinger May 20, 2026

Choose a reason for hiding this comment

Uh oh!

jgraettinger May 20, 2026

Choose a reason for hiding this comment

Uh oh!

jgraettinger May 20, 2026

Choose a reason for hiding this comment

Uh oh!

jgraettinger May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

williamhbaker commented May 18, 2026 •

edited

Loading